Skip to content

BUG: read parquet files with older pytz (DEP: keep lower pytz minimum version)#65133

Merged
mroeschke merged 5 commits intopandas-dev:mainfrom
jorisvandenbossche:old-pytz
Apr 10, 2026
Merged

BUG: read parquet files with older pytz (DEP: keep lower pytz minimum version)#65133
mroeschke merged 5 commits intopandas-dev:mainfrom
jorisvandenbossche:old-pytz

Conversation

@jorisvandenbossche
Copy link
Copy Markdown
Member

@jorisvandenbossche jorisvandenbossche commented Apr 9, 2026

When we made pytz an optional dependency (#59089), we also bumped the minimum version (and later bumped it once more #62241). This causes issues with reading parquet files when someone does not have this required minimum version (the reported bug in #64978).

While we could solve this by improving the error message you get (so it is clear you have to update pytz), I also think there is not really a need to bump the minimum version here (pytz is mostly in maintenance mode AFAIK, and so the newer versions I assume are mostly updating the tz data)

For tzdata we actually decided to remove the minimum version altogether (#63335), but since pytz is still an API in addition to the tz data, I just kept the minimum version we had before in pandas 2.x (that should then at least not give problems for people upgrading from pandas 2 to 3 without upgrading pytz).

@jorisvandenbossche jorisvandenbossche added this to the 3.0.3 milestone Apr 9, 2026
@jorisvandenbossche jorisvandenbossche added Timezones Timezone data dtype IO Parquet parquet, feather labels Apr 9, 2026
@jorisvandenbossche jorisvandenbossche changed the title BUG: read parquet files with older pytz BUG: read parquet files with older pytz (DEP: keep lower pytz minimum version) Apr 9, 2026
@jorisvandenbossche
Copy link
Copy Markdown
Member Author

For the case you would still have an older pytz than 2020.1, I also want to improve the behaviour or error message. At the moment, we have many cases where we don't actually check exactly that we have a pytz object (using treat_tz_as_pytz without checking pytz can be imported).
That means some parts of our API do kind of work with an older version of pytz, although might give wrong results, eg:

>>> import pytz
>>> pd.Timestamp(2012, 1, 1).tz_localize("UTC").tz_convert(pytz.timezone("Europe/Brussels"))
Timestamp('2012-01-01 00:18:00+0018', tz='Europe/Brussels')
# should be Timestamp('2012-01-01 01:00:00+0100', tz='Europe/Brussels')

Although then some other parts raise a error (like what read_parquet ran into).

I could easily make read_parquet also "work" (but return those wrong values), but so it seems to be a better behaviour to actually raise a proper error message up front when trying to use a pytz timezone when your pytz version is too old.

That is what I added in the last commit (but since this is somewhat of a breaking change, I could also keep that for a separate PR for 3.1)

@jbrockmendel
Copy link
Copy Markdown
Member

Is it viable to write a test for this?

@jorisvandenbossche
Copy link
Copy Markdown
Member Author

Not directly, since it is for the case where pytz is too old, and we don't have a CI build for that. I don't know if it would be possible to mock the pytz version in a test?

@jbrockmendel
Copy link
Copy Markdown
Member

I don't know if it would be possible to mock the pytz version in a test?

I don't think so, especially if pyarrow is going to move to zoneinfo in the foreseeable future. Thanks for taking a look

@mroeschke mroeschke merged commit 398e59c into pandas-dev:main Apr 10, 2026
45 checks passed
@mroeschke
Copy link
Copy Markdown
Member

Thanks @jorisvandenbossche

@lumberbot-app
Copy link
Copy Markdown

lumberbot-app bot commented Apr 10, 2026

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 3.0.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 398e59c04ac30f4930bdbcdb0208e93e71d5a25a
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #65133: BUG: read parquet files with older pytz (DEP: keep lower pytz minimum version)'
  1. Push to a named branch:
git push YOURFORK 3.0.x:auto-backport-of-pr-65133-on-3.0.x
  1. Create a PR against branch 3.0.x, I would have named this PR:

"Backport PR #65133 on branch 3.0.x (BUG: read parquet files with older pytz (DEP: keep lower pytz minimum version))"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

Sharl0tteIsTaken added a commit to Sharl0tteIsTaken/pandas that referenced this pull request Apr 12, 2026
…-comparison

* upstream/main:
  PERF: use lookup instead of hash_inner_join for merge with unique right keys (pandas-dev#64691)
  BUG : update `SeriesGroupBy.ohlc()` to honor `as_index=False` (pandas-dev#65141)
  PERF: Use DataFrame-level reductions in DataFrame.agg with list of funcs (pandas-dev#65031)
  DOC: document required external libraries in read_* I/O docstrings (pandas-dev#65143)
  DOC: improve MultiIndex.is_monotonic_increasing/decreasing docstrings (pandas-dev#65154)
  BUG: Raise ValueError for non-boolean numeric_only in DataFrame/Series reductions (GH#53098) (pandas-dev#65131)
  BUG: Timedelta.round() raises ZeroDivisionError when internal unit is 's' and target frequency is sub-second (pandas-dev#64836)
  ENH: Add replace method to Index (closes pandas-dev#19495) (pandas-dev#65099)
  PERF: improve StringArray.isna (pandas-dev#57733)
  BUG: read parquet files with older pytz (DEP: keep lower pytz minimum version) (pandas-dev#65133)
  DEPR: deprecate dates-with-datetime64 in _maybe_downcast_for_indexing (pandas-dev#64871)
  DOC: note that DataFrame.values is not writeable (pandas-dev#65142)
  CLN: Update groupby observed defaults (pandas-dev#65148)
  PERF: avoid materializing values[indexer] in Block.setitem (pandas-dev#64251)
  DOC: update GroupBy.sum/min/max See Also sections (pandas-dev#65144)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: read_parquet fails with tz aware data

3 participants